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Abstract 

In recent times whole-genome gene expression analysis has turned out to be a highly 
important tool to study the coordinated function of a very large number of genes within 
their corresponding cellular environment, especially in relation to phenotypic diversity 
and disease. A wide variety of methods of quantitative analysis have been developed 
to cope with high throughput data sets generated by gene expression profiling experi- 
ments. Due to the complexity associated with transcriptomics, specially in the case of 
gene regulation phenomena, most of these methods are of a probabilistic or statistical 
nature. Even if these methods have reached a central status in the development of an 
integrative, systematic understanding of the associated biological processes, they very 
rarely constitute a concrete guide to the actual physicochemical mechanisms behind bi- 
ological function and the role of these methods is more on a hypotheses generating line. 
An important improvement could be done with the development of a thermodynamic 
theory for gene expression and transcriptional regulation that will build the foundations 
for a proper integration of the vast amount of molecular biophysical data and could lead, 
in the future, to a systemic view of genetic transcription and regulation. . 

1 Introduction 

Cellular phenotypes are mainly determined by the expression levels of many genes and their 
products such as enzymes, proteins and so on. One important tool to track down this cel- 
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lular phenotypic diversity is gene expression analysis. One hard-to-grasp issue is that the 
process of gene expression by itself is a complex one, both from the biochemical and ther- 
modynamical points of view The transcription of messenger RNA (mRNA) for a given 
gene from a DNA template often is regulated by different genes and their products. Being 
this the case, a variety of physicochemical interactions abounds between genetic transcripts 
abundance and it is a recognized fact that such complex processes are behind the ultimate 
mechanisms of cell function. Under this scenario, gene expression values are measured un- 
der different conditions, either on a simultaneous [steady- state] or serial [dynamics] fashion, 
in many cases the measurements are then treated as samples from a joint probability distri- 
bution. Genome- wide transcriptional profiling, also called Gene Expression Analysis (GEA) 
has allowed us to go well beyond studying gene expression at the level of individual com- 
ponents of a given process by providing global information about functional connections 
between genes, mRNAs and the related regulatory proteins. GEA have greatly increased our 
understanding of the interplay between different events in gene regulation and have pointed 
out to previously unappreciated biological functional relations, such as the coupling between 
nuclear and cytoplasmic transcription and metabolic processes [2]. GEA also revealed exten- 
sive communication within regulatory units, for example in the organization of transcription 
factors into regulatory motifs. 

The transcriptional behavior for every gene is simultaneously regulated by both, its re- 
lated chromatin structure and associated transcription factors. In eukaryotes (organisms with 
a cellular nucleus), for example, genomic DNA is packaged into nucleosomes that are made 
of DNA and octamers of a class of proteins called histones. Another set of proteins called 
chromatin modifiers are able to move the histones all along the DNA chain to expose spe- 
cific regions and then, replace histones with specific histone variants to convert chromatin 
from a transcriptionally repressed state into a transcriptionally accessible state, hence en- 
abling gene expression. In the case of transcription factors (TFs), they bind at regulatory 
regions to either activate or repress the transcription of their target genes. TFs do this by 
(respectively) promoting or inhibiting recruitment of RNA polymerase II. TFs also recruit 
chromatin-modifying enzymes to make their target DNA more accessible to the transcrip- 
tional machinery (for a more detailed account see section 3). In the past, the different steps 
involved in the regulation of gene expression-transcription, mRNA processing, nuclear ex- 
port, translation and degradation - were usually analyzed in isolation by using conventional 
biochemical techniques. This way of looking at things has given the impression that such pro- 
cesses are independent. Former investigations were focused on the mechanisms underlying 
individual gene expression or in the best scenario the behavior of a small set of genes, rather 
than exploring regulatory mechanisms that can influence many genes at one time. Systematic 
studies of genome-wide binding patterns made evident the existence of a great deal of coor- 
dinate regulation among TFs. Factors that combinatorialy regulate (on a concomitant way) 
a particular gene also often coordinately regulate the expression of other genes, potentially 
even themselves or each other. Given this fact, they are not independent inputs that merge 
only at a particular promoter, but rather are coupled. Of course these complex phenomena 
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will ultimately affect a thermodynamical description of transcription regulation because the 
concentrations (expression levels) and chemical potentials of mRNA transcripts are combi- 
natorialy correlated. 

Nevertheless, even if we are now provided with experimental techniques to measure the 
behavior of thousands of mRNA transcripts simultaneously, and a great deal of attention has 
been put on the computational and statistical analysis of such huge amounts of data; the the- 
oretical approach is still looking at the regulatory interactions at a one by one basis. This 
approach is of course changing towards a more systematic, network- oriented understanding 
of gene regulation phenomena. One usual means to understand the nature of such intricate 
phenomena is by using the so-called Gene Regulatory Networks (GRNs). GRNs are power- 
ful graph-theoretical constructs that describe the integrated status of a cell under a specific 
condition at a given time S. The complex description given by GRNs consists, generally, 
in identifying gene interactions from experimental data through the use of theoretical models 
and computational analysis. Transcriptional network analyses have showed that, instead of 
being independent, different levels of gene regulation are strongly coupled. In some cases, 
have been recognized that the factors involved in a specific stage of mRNA transcription can 
exhibit coordinated behavior, for example, by finding how groups of transcription factors 
bind cooperatively at many related promoters. 



2 Thermodynamics of hybridization 

Understanding the thermodynamical basis of the hybridization process is an important task 
related to both, the explicit, intrinsic mechanisms of gene expression and its experimental 
measurement, especially in the case of high throughput technologies such as the gene chips. 
One initial approach is to calculate hybridization thermodynamics based on the inference of 
free energies by means of the energetic cost of base-pair opening in the RNA complex [[41. 
This approach has been also applied to understand the selective hybridization processes re- 
lated to mRNA silencing (gene switching) by means of small interferring RNA molecules 
(siRNAs) that are RNA molecules that bind (hybridize) to specific mRNA transcripts thus 
forbidding their ultimate translation into proteins (Si. In both scenarios the thermodynamic 
equilibrium and its properties are important to understand and quantify the degree of hy- 
bridization, the specificity of it and the steady- state concentration of mRNA transcripts after 
either the measurement process or the silencing, respectively. In the present paper, we are 
more interested in the thermodynamics associated with gene expression quantification and 
profiling in high throughput experiments, since this is (at least at the moment) the ultimate 
and more accurate laboratory tool to study the mechanism of genetic transcription. 

According with the Langmuir adsorption model of oligonucleotide hybridization, the 
specific-hybridization intensity (or gene-expression signal) for a gene probe as measured by 
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(for example) an Affymetrix-type gene chip is given by JH: 

A c e-^^^ 



1 + c e-/3^G 



(1) 



where P = T is the local temperature, R is the gas constant, c is the mRNA con- 
centration for this species, AG is the free energy of hybridization, and A is a parameter that 
sets the scale of the intensity corresponding to the saturation limit c ^ e^^'^. A natural 



generalization of Eq. [T]for a probe i within a set of M gene-probes {i 



M) is: 



1 + a e-^^G, 



(2) 



The local chemical potential yUj of species i due to the hybridization process is defined as 
customarily by yUj = ^ • From Eq. [2lit is possible to calculate /Xj by means of the 



chain-rule as follows: 
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or in terms of the direct derivatives: 
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The first derivative is calculated as: 
If we re-arrange terms: 
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which can be then expressed in terms of ipi to read: 
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Now, in the case of the second derivative in Eq. ffl it is given by: 
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and then simplifies to: 
Eq. m could also be written in terms of Lpi 
which gives as a result that: 
recalling Eq. |4l|7]and[lll we finally get: 



(11) 



Ci 



fl-— ) -1 

or 

= (13) 

Ci 

It is interesting to notice that this level of description (two state Langmuir adsorption 
model) gives an expression (Eq. [T3l) for the chemical potential that is equivalent to that of 
an ideal gas, i.e. non-interacting species, for if we calculate the equilibrium chemical-work 
contribution to the free energy, Sj, we obtain: 

j^'ip 

dci (14) 

Ci 

or 

Ei = -RTln(^^^ (15) 

This approximation is valid as long as the rate of cross-hybridized targets stays low, since 
if there is only (or mostly) transcript-specific hybridization, the chemical species (in this 
case the different mRNA molecules) could be considered non-interacting. This is a realistic 
assumption given the low concentrations of every transcript in solution and also the fact that 
current technologies are very efficient in reducing the rate of unspecific hybridization [[6l. 
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3 Transcriptional regulation 

The phenomenon of gene expression (also known as mRNA transcription or simply tran- 
scription) is a complex one. There is a set of control mechanisms collectively called tran- 
scriptional regulation that take the duty to control when transcription occurs and also how 
much mRNA is created. The transcription of a given gene by means of the RNA polymerase 
enzyme (RNApol) can be regulated or controlled by at least five different biochemical mech- 
anisms. 

• There exists a set of proteins called specificity factors that alter the specific binding of 
RNApol to some given promoter or set of promoters. A promoter is a DNA region 
located next -technically in the upstream cys location or towards the 5' region of the 
sense strand- to a gene that facilitates its transcription by making that region easy to 
recognized by the transcriptional machinery. 

• Repressors are DNA-binding proteins whose function is the regulation of the expres- 
sion of one or more genes by decreasing the rate of transcription. The actual mech- 
anisms involves their attachment to an operator hence forbidding the transcription of 
the adjacent segment of DNA by blocking the pass of RNApol. 

• Transcription factors are proteins that bind to specific DNA sequences in order to con- 
trol the rate of transcription. Transcription factors are able to perform their function 
alone, or by forming a complex with other proteins. Transcription factors bind to ei- 
ther enhancer or promoter regions of DNA adjacent to the genes that they regulate. 
Depending on the transcription factor, the transcription of the adjacent gene is either 
up- (i.e. higher concentrations of the corresponding mRNA) or down-regulated (lower 
concentrations of mRNA). Transcription factors use a variety of mechanism for the reg- 
ulation of gene expression. These mechanisms include: stabilize or block the binding 
of RNApol to DNA, catalyze the acylation or deacylation of DNA. The transcription 
factor can either do this alone or by recruiting other proteins that possess catalytic 
activity. 

• The DNA-binding proteins that enhance the interaction of RNApol to a particular pro- 
moter region, thus enlarging the expression levels of the associated gene are called 
Activators. Activators perform their work by means of either electrostatic interactions 
with some sub-units of RNApol (attracting the molecule towards them and hence to- 
wards the DNA region they are bound to) or by inducing conformational changes in 
the structure of DNA that make easier its binding to RNApol. 

• Finally, Enhancers are regions in DNA that are able to bound with activators hence 
bringing promoters to the initiation complex. 
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4 Non-equilibrium thermodynamics for small reactive sys- 
tems: the Transcriptional Regulation scenario 

As is already evident from the previous section, the process of gene regulation within a cell is 
highly complex from the bio-physicochemical point of view. Another source of complexity 
in the non-equilibrium thermodynamical characterization of such system lies in the fact that 
a cell is a small system, in the sense that its dimensions do not permit an obvious application 
of the thermodynamic limit. Specifically, the role of fluctuations and stochasticity within 
such scenarios is not clear. Small systems thermodynamics for equilibrium systems has been 
studied in the past [fTOlfTTI and some results were even expected to extend to local equilibrium 
settings within cellular sized biosystems [fT2ll . One important limitation for the development 
of such theoretical frameworks at that time was the lack of proper experimental settings 
to test their hypotheses. Nevertheless, with the development of modern techniques, such 
as microscopic manipulation by means of atomic force microscopes, optical tweezers and 
cold-traps this situation has become less of a limitation. In the meantime, theories have been 
developed to explain several results. These include mesoscopic thermodynamical approaches 
[fT3llT6i and also studies made by means of the so-called fluctuation theorems [|T7irT8llT9ll20l . 
Some of these theoretical results have been even experimentally tested. 

4.1 Fluctuation phenomena in non-equilibrium systems 

To have a better idea of the role of large local fluctuations in small systems, let us recall an 
ideal gas composed by N particles. The total energy of the system is a Gaussian distributed 
random variable with average < e >~ NksT and variance cr^ ~ Nk'^T'^. In that (general) 
case the fluctuations of the system are proportional to A^^2. This means that for systems 
of size N ^ 0[1] they are comparable (and thus important !), whereas for a system with 
N ^ O[10^'^] these same fluctuations become negligible. An interesting case of study is 
the cell behavior of the RNApol molecule already mentioned. As we have said RNApol is 
an enzyme that moves along the DNA to produce a newly synthesized mRNA molecule. It 
has been mentioned that RNApol extracts energy from its surrounding thermal bath (i.e. the 
cellular environment) to move, and at the same time uses bond hydrolysis to insure that only 
thermal fluctuations that lead to forward movement are captured. RNApol then serves as an 
out-of-equilibrium thermal rectifier. The complex dynamics behind even this (relatively) sim- 
ple model of transcription demonstrate the necessity for a non-equilibrium thermodynamical 
characterization that includes the possibility to deal with fluctuations in small systems. 

A very important concept in the non-equilibrium fluctuations setting is that of a control 
parameter. Roughly speaking, a control parameter is a variable that must be specified to 
define on an unambiguous manner the state of a non-equilibrium system, i.e. control param- 
eters are non-fluctuating variables. If we call s„ (n = 1 . . .p) the set of parameters of a 
non-equilibrium system and is the control parameter. If we vary x^, then the total energy 
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of the system will vary accordingly as: 




One can see that the first term(s) correspond to the variation of energy as a result of in- 
ternal configurations (we naively call this the heat) whereas the second term is the energy 
change due to an external perturbation (that is the work). Of course this formulation implies 
the experimental difficulty of finding an appropriate (natural) control parameter without dis- 
turbing (too much) the system. However, since there is a presence of thermal rectification 
phenomena in non-equilibrium small systems, Eq. [16] will serve as a basis for the extended 
irreversible thermodynamical description below. 

4.2 Mesoscopic non-equilibrium thermodynamics 

As it has already been said, systems outside the realms of the thermodynamic limit are charac- 
terized by large fluctuations and hence stochastic effects. The classic thermodynamic theory 
of irreversible process (CIT) [9| gives a rough, coarse grained description of the systems, one 
that ignores all the details of the molecular nature of matter, hence studying it as a continuum 
media by means of a phenomenological field theory. As such CIT is not suitable for the de- 
scription of small systems because fluctuations, ignored by CIT could become the dominant 
factor in the system's dynamical evolution and response. Nevertheless, in many instances 
(such as the present case of gene expression regulation) it would be desirable to have a ther- 
modynamic theoretical framework to study such so-called nano-systems. One possible way 
to do so is by considering the stochastic nature of the time evolution of small non-equilibrium 
systems. This is the approach followed by Mesoscopic Non-Equilibrium Thermodynamics 
(MNET) |T3l. MNET for small systems could be understood as an extension of the equilib- 
rium thermodynamics of small systems developed by Hill and co-workers [ITOlfTTlfTlll . 

The way in which stochasticity is coming into play is by means of recognizing that scal- 
ing down the description of a physical system brings up energy contributions that are usually 
neglected in thermodynamical descriptions either in equilibrium or outside of it. These con- 
tributions take the form of, for example, surface energies and bring in turn a disruption of the 
canonical view of extensivity. An example used by Hill IfTTTl is that of a small cluster of N 
identical particles for which the equilibrium Gibbs energy is given as: G = ijlN + aN^^ with 
H the chemical potential, a an arbitrary adjustig function and /5 < 1 a size-effect exponent. 
Here, the second term represents these energies that are usually disregarded whose effects 
become negligible for very large since the first term becomes dominant. In this way at 
the thermodynamic limit one gets the usual G = relation. It is then possible to treat the 
Gibbs energy as a fluctuating quantity. Of course we can adjust the definition of the chemical 
potential to account for these effects. 
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Defining 

li = fi + aN'^-^ (17) 

It is possible to recover the standard Euler relation G = p,N . However, one must be 
cautious since even if /I accounts for the actual energy potential involved in the thermochem- 
ical description of such a small system. It is NOT a canonical chemical potential, since for 
instance, it does not a give rise to an extensive thermodynamical description. Of course in 
the thermodynamic limit p, ^ ji. 



In the same order of ideas, MNET was developed to characterize non-equilibrium small 
systems. Let us recall that any reduction of the spatio-temporal scale description of a system 
would entail an increase in the number of non-coarse grained degrees of freedom (we are 
looking at things with more detail as to say). These degrees of freedom could be related with 
the extended variables in Extended Irreversible Thermodynamics [23], but they could also 
be more microscopic in nature, such as colloidal-particle velocities, orientational states on a 
quasi-crystal, and so on. Hence, in order to characterize such variables, let us say that there 
exist a set T = [vi] of such non-equilibrated degrees of freedom. P(T, t) is the probability 
that the system is at a state given by T at time t. If one assumes [fT4l . that the evolution 
of the degrees of freedom could be described as a diffusion process in T-space, then the 
corresponding Gibbs equation could be written as: 



5S=-^j ^^{T)P{T,t)dl: 



(18) 



/x(T) is a generalized chemical potential related to the probability density, whose time- 
dependent expression could be explicitly be written as: 

/i(T, t) = ksT In i^^'^^ + f^equii (19) 

^\ ^ Jequil 

or in terms of a nonequilibrium work term AW: 

lj{T,t) = kBTlnP{T,t) + AW (20) 

The time-evolution of the system could be described as a generalized diffusion process 
over a potential landscape in the space of mesoscopic variables T. This process is driven by a 
generalized mesoscopic-thermodynamic force ^(f ) whose explicit stochastic origin could 
be tracked back by means of a Fokker-Planck-like analysis fil3l [T4| . MNET seems to be a 
good candidate theory for describing non-equilibrium thermodynamics for small systems. In 
fact the aforementioned arguments point out to MNET being a good choice, provided one has 
a suitable model or microscopic means to infer the probability distribution P(T, t). 



One important setting where MNET seems appropriate is the case of activated processes, 
like a system crossing a potential barrier. Chemical reactions (and biochemical reactions like 
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the ones involved in gene regulation too!) are clearly in this case. According to [fT4l the 
diffusion current in this T-space could be written in terms of a local fugacity defined as: 

^(T)=exp^ (21) 

and the expression for the associated flux it will be: 

J=-kBL-^ (22) 
z a I 

L is an Onsager-like coefficient. After defining a diffusion coefficient D and the associated 
affinity A = /i2 — /ii, the integrated rate is given as: 

7= Jo(^l-exp^) (23) 

with Jo = D exp 

One is then able to see that MNET gives rise to nonlinear kinetic laws like Eq. [231 In 
this context MNET has been applied successfully in the past in biomolecular processes at 
(or under) the cellular level or description [ 15 1 . In that scenario, non-linear kinetics are used 
to express, for example RNA unfolding rates as diffusion currents, modeled via transition 
state theory, giving rise to Arrhenius-type non-linear equations. In that case the current was 
proportional to the chemical potential difference (cf. equation 17 of reference [[T5l ). so the 
entropy production was quadratic in that chemical potential gradient. We will re-examine 
these kind of dependency later when discussing gene expression kinetics. 



Since whole-genome transcriptional regulation consists on a (huge) series of biochemi- 
cal reactions, and many of these has unexplored chemical kinetics, a detailed MNET anal- 
ysis such as the one described above is unattainable at the present moment. On what fol- 
lows, we will explore a phenomenologically based approach that nevertheless takes into ac- 
count (although in a more intuitive, less explicit way) similar considerations as the MNET 
framework already sketched. This phenomenological approach is based on the Extended 
Irreversible Thermodynamics assumption of enlargement of the thermodynamical variables 
space [I2ni22l. 



4.3 Extended Irreversible Thermodynamics 

We shall start our discussion by assuming that a generalized entropy-like function \& exists, 
which may be written in the form [|23ll24ll : 

d"^ ^ ,,dU dv ^ dCi ^ 
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or as a differential form: 



dt^ = [dtU + pdtv - liidtd dt^j] (25) 

i j 

We see that Eq. [25] is nothing but the formal extension of the celebrated Gibbs equation 
of equilibrium thermodynamics for the case of a multi-component out of equilibrium system. 
The quantities appearing therein are the usual ones: T is the local temperature, p and V the 
pressure and volume, etc. Xj and are extended thermodynamical fluxes and forces. These 
extended forces and fluxes are the new elements of EIT, the ones that take into account the 
aforementioned non-local effects. 



In the case of a multicomponent mRNA mixture at fixed volume and pressure, we will 
take our set of relevant variables to consist in the temperature T(r, t) and concentration of 
each gene species Ci{r, t) as the slow varying (classical) parameters set S and the mass flux 
of these species Jj(r, t) as fast variables on the extended set JF so that Q = S [] !F . These 
latter variables will take into account the presence of inhomogeneous regions (concentration 
domains formed because of the gene regulatory interactions) and so will correct the predic- 
tions based on the local equilibrium hypothesis. The non-equilibrium Gibbs free energy for 
a mixture of i = 1 . . . M, mRNA transcripts at constant pressure, then reads: 

d,G = -^dtT + ^ ^idtCi + E '^i dt^j (26) 

i 3 

If one is to consider gene expression/regulation as a chemical process, it must be useful to 
write things up in terms of the extent of reaction ^, hence {dtG)T,p,<s>j = J2i fJ^ldtNi is rewrit- 
ten by means of the definition of the so-called stoichiometric coefficient z/j = ^ and of the 

chemical affinity A = Y^iA^i- The stoichiometric coefficients and the chemical affinities 
could be defined likewise for a set of (/c = 1 . . .R) regulatory interactions (considered as 
chemical reactions) as follows: 

dtG = -^idtT + E Akdt^k + Y.^3® dt^3 (27) 



or 



d^G = -^dtT + J2 

k 



dtik + E -^i dt^3 (28) 



4.4 Mean field approach 

In many cases the explicit stoichiometry of the regulatory interactions is unknown and in 
the vast majority of the already studied cases the reactions are given on a one-to-one basis, 
i.e. one molecule of a transcription factor on each gene-transcription site (or one molecule 
of each kind of transcription factor in the case of multi-regulated gene targets). Given this. 
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at the moment we will assume z/j = 1; Vi. In this diluted case we have that the extent 
of each reaction is then proportional to the concentration rate of change and we recover 
the non-reactive regime similar to that given by Eq. [261 It is important to stress that this 
approximation is not a disparate one given the fact that the usual DNA/RNA concentrations 
within the cells are in the picomolar-nanomolar regime. Also, of the almost 30,000 different 
genes in humans just a small number of these (about 1000- 1 500) are known to be transcription 
factors. Nevertheless in order to take into account the scarce yet important gene regulatory 
interactions (albeit in an indirect manner) we retain the generalized force-flux terms to get: 



Since gene regulation occurs within the cell, it is possible to relate an internal work term 
with the regulation process itself, being this di far from equilibrium contribution. This non- 
local contribution is given by the generalized force-flux term (third term in the r.h.s. of Eq. 
|29l) . This is so as gene regulation often does not occur in situ and also since is the only way 
to take into account (albeit indirectly) the changes in the local chemical potentials that cause 
the long tails in the fluctuations distributions characteristic of non-equilibrium small systems 
(e.g. cells). The term relating mRNh flows due to transcriptional regulation could be written 
as a product of extended fluxes $j and forces Xj. Here j = 1, . . . M refers to the different 
mRNA species being regulated, that is, indexes i and j refer to the very same set of mRNA 
transcripts but in one case (i) we take into account their local equilibrium behavior (as given 
by their independent chemical potentials and average local concentrations) and in the other 
case (j) we are interested in their highly fluctuating (far from equilibrium) behavior as given 
by the term J2j © dt^j 

Now we are faced with the task to propose a form for the extended fluxes and forces 
within this highly fluctuating regime, that at the same time allow for experimental verifi- 
cation, is simple enough to be solved and it is compatible with the axioms of extended ir- 
reversible thermodynamics. As a first approach, we are proposing a system of linear (in the 
forces) coupled fluxes with memory that was used to successfully characterize another highly 
fluctuating system, a fluid mixture near the critical point ||25l . 

The constitutive equations are. 



(29) 




(30) 




(31) 



The A's are time-independent, but possibly anisotropic amplitudes, m is a unit vector in 
the direction of mass flow (the nature of u will not affect the rest of our description, since we 
will be dealing with the magnitude of the mass flux | $j | ) and r's are the associated relaxation 
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times considered path-independent scalars. Since we have a linear relation between thermo- 
dynamic fluxes and forces some features of the Onsager-Casimir formalism will still hold. 
This will be especially important when considering cross -regulatory interactions. An inter- 
esting question for future research will be whether gene transcription interactions as modeled 
here obey Onsager's reciprocal relations. 

Dynamic coupling is given by Eq. [30]and[3Tl nevertheless due to the fact that actual tran- 
scription measurement experiments are made either on homeostasis (steady state) settings or 
within time series designs with intervals several orders of magnitude larger than the associ- 
ated relaxation times (which are of the order of a few molecular collision times) it is possible 
to take the limits r* and rj^ — ^ 0, then the integrals become evaluated delta functions 
to give: 

$,{r,t)=uY,X%fi,,k{r,t) (32) 

k 

X,{f,t) = Xf$,{f,t) (33) 

It is important to notice that in the future, it will must surely became possible to experi- 
mentally measure gene expression in time intervals much shorter (maybe even in real time). 
In that case, the appropriate theoretical setting will be given by Eq. [30] and [3T] that represent 
the dynamic nature of the coupling better than Eq. [321 and [331 

Also due to the spatial nature of the experimental measurements (either RNA blots or 
DNA/RNA chips measure space-averaged mRNA concentrations) it is possible to work with 
the related scalar quantities instead, to give: 

^,{r,t) = J2Xl,fi,4f,t) (34) 

k 

X^{f,t) = Xf <l>,{f,t) (35) 
Substituting Eq. l34l and l35l into Eq. [29|one gets: 



dtG = -^dtT + ^ ^AC^ + E E Hk) dt (Af (36) 

i j k 

Assuming the generalized transport coefficient to be independent of the flux $j we 
are able to write: 

d,G = -^dtT + E fi^dta + E E /^.--fc) ^fdt'^^ (37) 

i j k 

Or in terms of the transcription regulation chemical potentials jjij^k 

dtG = -^dtT + E ^^^dtC^ + E E I'j^k) Af {Xlkdtf^j,k + li,,kdtX%) (38) 
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In the constant transport coefficient approximation, Eq. [38] reads: 

dtG = -^dtT + J2 ^^^dtC^ + Y.i>^lkf>^fHk dtf^j,k (39) 

i j k 

Defining Lj^^ = ^^^'''l 

dtG = -^idtT + pi^tCi + E E ^t^fc (40) 

i j k 



It is possible to see from Eq. |40| that genetic transcription could be characterized as a 
second-order effect, this raises from the fact that the actual mechanism of gene expression is 
regulation by other gene products such as enzymes and transcription factors. 

dtG = dtT + E l^idtCi + E E ^i-fc dtfxlk (41) 

i j k 

As we have stated, fluorescence intensity signals as measured by, for example, Microar- 
ray experiments (i.e. gene chips) are the usual technique to acquire information about the 
concentration of a given gene under certain cellular conditions. From Eq. [21 the concentra- 
tion of a given gene-probe (with hybridization energy AGj) is a function of the intensity as 
follows: 

c- = (42) 

It is not unreasonable to consider that the local single-species energy of formation for a 
given mRNA transcript, (i.e. the partial chemical potential /ij in Eq. [41]) has the same (abso- 
lute) value as the chemical potential of hybridization for the same mRNA species, as given 
by Eq. [13] such that /ij = +RT/ci could be used in the thermodynamical characterization of 
gene expression as given by Eq. [41] If we insert Eq. [42]into Eq. [T3]we get: 

^i^ = ^ ^ (43) 



Now, by taking the time derivative of Eq. 



dc, A, e-^^^' 



dipi 
dt 



(44) 



By substitution of Eq. [43] and Eq. [44] into Eq. [4U 

dtG = dtT + E (^^e-/3AG. _^^e-/3AG.) dm + Lj,k dtfil^ (45) 

If we define Fj = g^/3AG^^^'e-/3AG^^ the thermodynamic conjugate variable to the 

probe intensity (^i we obtain: 

dtG = dtT + E r. dtip, + E E dtfi% (46) 
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5 Results and Discussion 

Let us examine in some detail the structure of Eq. |46l In the isothermic, non-regulated 
steady state (i.e. dtG = 0, dtT = 0, dtfi'j f. = Vj, k), Eq. |46] is nothing but a formal 
non-equilibrium extension of the Gibbs-Duhem relation Y.i Ti dt(fi = 0. Without any gene 
regulatory mechanism, and without explicit dissipation, the energetics of gene expression 
within a cell are just the ones of a non-interacting dilute mixture of its components (in this 
case the different mRNA transcripts). A more realistic case is the regulated, isothermal steady 
state given by: dtG = 0, dtT = and at least some ^ 7^ 0. This is the more interesting 
case that one can compare with actual gene transcription experiments nowadays. This is so 
because, on one hand, due to the specific nature of nucleic acids (both DNA and RNA suffer 
thermal decay) and also due to physiological conditions; temperature changes are subtle or 
negligible within the living cell or inside a realistic biological assay. 

The steady state condition is more of a present-time situation than a definitive limitation. 
Most dynamic gene expression studies nowadays are studied as time series (or time-courses 
in the biomedical language) with time-steps dictated by economical or pharmacological and 
not by biophysical reasons. Typically, the smaller time-steps are of the order of minutes if not 
hours or days. Regulatory changes can be thus measured just in their steady-state mean-field 
contributions (coarse grained in space and time) and not in their whole dynamical complex- 
ity. Of course, as the costs of Microarray processing lower and as the technologies advance, 
one expects to see better resolution time series for transcriptional dynamics. 

Let us then consider the regulated isothermal steady-state version of Eq. |46l namely: 



One could see that changes in the mRNA concentration of gene i as measured by its probe 
intensity could depend not only in their own characteristic thermodynamical parameters 
{Ai, AGi, and T) but also on other mRNA transcript (say n) via a coupling given by a term 
L„ j /x^ •. In that case one says that the n-th gene regulates the i-th gene, or that n is a tran- 
scription factor for i (conversely i is a transcriptional target of n). 

In order to give a concrete example (for the sake of clarity), we will consider the irre- 
versible thermodynamic coupling that sets the process of transcriptional regulation between 
two genes Genes = {1,2}. In this case we will assume that gene number 1 is a transcription 
factor for gene number 2 and that gene 1 is non-regulated (i.e. gene 1 is not a target for any 
TF). This means that /ii 2 7^ and that /ii 1 = /i2,i = /i2,2 = 0. In this case Eq. |47]will read: 



(47) 



i j k 



Ti dtifi + T2 dtip2 + Li^2 dtfil^2 







(48) 



To make explicit calculations from experimental data we will consider SYK, the transcript 
responsible for the synthesis of spleen tyrosine kinase as gene 1 and IL2RB or interleukin 2 
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receptor, beta as gene 2. SYK is well known for being a strong inducer of gene transcription, 
specially in the case of the beta domain interleukin 2 receptor [26]. Also, there is a strong 
evidence indicating the possible role of these two genes in the course of the so-called C-MYC 
network of reactions, a very important, cancer-related biochemical pathway. 

The values of the parameters could be calculated as follows. According with the algo- 
rithm developed by Lu, et al [5J and described by Carlon, et al (cf. Table 1 of reference 
flUl) it is possible to obtain suitable values for AGi = 483.55 kcal/mol and AG2 = 463.05 
kcal/mol (see Table 1). From these values, we can calculate Ai and A2 from Eq. |2]following 
saturation measurements in the latin square experiments [[6l|5l@]|. In this case Ai = 5513 
intensity units/mol and A2 = 1105 intensity units/mol (see figures 1 and 2). 

Given these parameters, from a time-course GEA it is possible to calculate both Fi = 
Fi((y9i) and F2 = T2{'f2), and via ^pi{t) and (f2i't) we could as well obtain the time evolution 
for /ii 2, hence characterizing in a complete form the transcriptional regulation for this simple 
(almost trivial from the biological standpoint) gene switch. 

Taking the aforementioned values, we have the following expressions for the thermody- 
namic functions in terms of the experimentally measurable intensities (in all cases a physi- 
ological temperature of T = 37° C is assumed), hence P = 1.622507 x 10~^ mol kcal~\ 
g-/3AGi ^ 0.99922, Al X e'^^^^^ = 5508.67950 intensity units/mol; also e'^^^^ = 0.99925, 
and A2 X e~'^'^^^ = 1104.17014 intensity units/mol. 

Calculating the intensity-dependent chemical potentials we obtain, from Eq. l43]kcal/mol: 

3.395 X 10^ - 6.158 X 10^ 
fii = (49) 

and 

6.805 X 10^ - 6.159 x 10^ ip2 
fi2 = (50) 

f2 

As we could seed from Eq. |49] and [50] (Figure 3), there is a difference in the transcrip- 
tional behavior of gene 1 (SYK) which is a transcription factor and gene 2 {IL2RB) which is 
not (and, in fact is a transcriptional target). The maximum intensity (related to a maximum 
concentration peak) attainable in both cases in the spontaneous regime is of 5513 intensity 
units for SYK, whereas in the case of IL2RB is of just 1 105 intensity units. This means that, in 
order for IL2RB to be produced at higher rates, the presence of chemical environment modi- 
fications (e.g via transcription factors) is needed. 

In a very straightforward way (similar to our /ij calculations) we are now able to calculate 
expressions for Fi and F2 as follows (see Figure 4). 

P ^ 5508.6795 

^ 0.008938 ^1 - 1.6098 x lO"" 
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1104.17014 



(52) 



0.001795 ^2 - 1.6243 x lO'^ 
If we substitute Eq. [5T] and [521 into Eq. |48]we obtain: 



5508.6795 



2 dt(pi 



0.008938 ipi - 1.6098 x 10-6 cpj 
1104.17014 



2 dt^2 



(53) 



+ 



0.001795 ip2 - 1.6243 x 10-^ cpj 



Integrating 




616321.2687 In 



-1.6098 X 10-6 ipi 



+ 615, 136.5683 In 



0.008938- 1.6098 x 10-^ 
-1.6243 X 10-6 (^2 



(54) 



0.001795- 1.6243 x 10-6 



Taking experimental values of (pi{t) and (^)j Eq. [54]could be solved for /Xi,2(^)- As we 
already stated, both SYK and IL2RB are involved in the transcriptional network related to the 
C-MYC pathway which is very important in the development of cancer. 

In order to capture more subtle regulatory dynamics one will need experiments with a 
large number of smaller time-step measurements, but in principle one is able to observe de- 
tailed patterns even within this very simple thermodynamic model. 

Interestingly, for this single gene switch it is also possible to calculate the dependency 
of the transcriptional regulation coupling /ii 2 on the particular cellular environment by solv- 
ing Eq. [54] for the same two genes under different phenotypical conditions (e.g. cancer 
versus normal cells, treated vs untreated diseased cells, etc.). The systematic study of such 
thermodynamic cellular-context transcription regulation theory seems to be a promising re- 
search area in the non-equilibrium thermodynamics of biosystems. In conclusion, we have 
showed here that a non-equilibrium thermodynamical description of cell-level transcriptional 
regulation could be formulated in terms of experimentally measurable quantities, and that 
essential features of gene regulatory dynamics could be studied with it. The model has been 
progressively simplified to match with todays technological and practical limitations, never- 
theless these simplifications are not necessary in principle, and can be eliminated when better 
experimental resolution (specially with regards to more samples and time-points) could be 
attained. 
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Table 1: Thermodynamic data for gene transcripts included in the Latin Square experiments. 
AGtr at 37"C are calculated according to reference [4]. 



probeset_key 


Gene 


Gene Name 


Transcription Factor activity 


l^ljtr M o 1 O (t^cal/mol) 


203508_at 


TNFRSFIB 


tumor necrosis factor receptor superfamily, member IB 




502.52 


zU43oJ_at 


SELL 


selectin L 




446.34 


204513_s_at 


ELMOl 


engulfment and cell motility I 




471.35 


zU4zu3_at 


AFUBLCJO 


apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like 3G 


Reverse TF 


477.38 


204959_at 


MNDA 


myeloid cell nucleai^ differentiation antigen 


TF Regulation 


433.97 


207655_s_at 


BLNK 


B-cell linker 




436.55 


204836_at 


GLDC 


glycine dehydrogenase (decarboxylating) 




468.99 


20529 l_at 


IL2RB 


interleukin 2 receptor, beta 




463.05 


209795_at 


CD69 


CD69 molecule 




398.72 


207777_s_at 


SP140 


SP140 nuclear body protein 


TF activity 


700.57 


90^019 at 
ZUM-ViZ dL 




interleukin 10 receptor, alpha 




474 56 


205569_at 


LAMP3 


lysosomal-associated membrane protein 3 




636.01 


207160_at 


IL12A 


interleukin 12A 




453.31 


205692_s_at 


CD38 


CD38 molecule 




569.56 


2l2827_at 


IGHM 


immunoglobulin heavy constant mu 




482.07 


209606_at 


PSCDBP 


cytohesin 1 interacting protein 




458.19 


205267_at 


P0U2AF1 


POU class 2 associating factor 1 


TF Regulation 


473.5 


204417_at 


GALC 


galactosylceramidase 




410.95 


205398_s_at 


SMAD3 


SMAD family member 3 


TF activity + Binding 


465.08 


209734_at 


NCKAPIL 


NCK-associated protein 1-like 




716.67 


209354_at 


TNFRSF14 


tumor necrosis factor receptor superfamily, member 14 




782.09 


206060_s_at 


PTPN22 


protein tyrosine phosphatase, non-receptor type 22 




414.77 


205790_at 


SKAPl 


src kinase associated phosphoprotein 1 


TF 


452.57 


200665_s_at 


SPARC 


secreted protein, acidic, cysteine-rich (osteonectin) 




443.24 


20764 l_at 


TNFRSF13B 


tumor necrosis factor receptor superfamily, member 1 3B 


TF inducer 


481.8 


207540_s_at 


SYK 


spleen tyrosine kinase 


TF inducer 


483.55 


204430_s_at 


SLC2A5 


solute carrier family 2 (facilitated glucose/fructose transporter), member 5 




488.56 


203471_s_at 


PLEK 


pleckstrin 




459.23 


20495 l_at 


RHOH 


ras homolog gene family, member H 


TF Regulation 


467.94 


207968_s_at 


MEF2C 


myocyte enhancer factor 2C 


TFact, RNAPol ind 


472.81 



Mean intensity versus transcript concentration 




600 
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Figure 1 : Gene expression intensity as a function of mRNA concentration for SYK and IL2RB 
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Figure 2: Intensity amplitude coefficient as a function of mRNA concentration for SYK and 
IL2RB 
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Figure 3: Individual chemical potentials for non-regulated transcription /isyk and iiil2RB as 
a function of gene expression intensity 



22 



Gene expression thermodynamics 



■•MplB.-i 




J ^. 1 1 1 1 1 1 

It too -S^ 30p 400 soo eoo TOO epo 



Figure 4: Intensity parameters for non-regulated transcription Tsyk and Tsyk as a function 
of gene expression intensity 



